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Abstract 

We explicitly construct a stationary coupling attaining Ornstein's d-distance between or- 
dered pairs of binary chains of infinite order. Our main tool is a representation of the transition 
probabilities of the coupled bivariate chain of infinite order as a countable mixture of Markov 
transition probabilities of increasing order. Under suitable conditions on the loss of memory of 
the chains, this representation implies that the coupled chain can be represented as a concate- 
nation of iid sequences of bivariate finite random strings of symbols. The perfect simulation 
algorithm is based on the fact that we can identify the first regeneration point to the left of the 
origin almost surely. 

Key words: Ornstein's d-distance, chains of infinite order, ordered binary chains, regener- 
ative scheme. 



1 Introduction 

Let X = (X n ) ng g and Y = (l^) n ez be two stationary chains of infinite order on the alphabet 
A = {0, 1}. The d-distance between X and Y is defined as 

<2(X, Y) = inf |P(X ^ Y ) : (X, Y) stationary coupling of X and y} . (1.1) 



The (i-distance was introduced by Ornstein in several papers and summarized in an invited 
article in the first issue of The Annals of Probability (Ornstein 1973). 

The existence of a stationary coupling attaining the d-minimum follows from following basic 
topological considerations. 

(i) The product space (A x A) 1, is compact by Tychonov's Theorem. 

(ii) By Prohorov's Theorem, any sequence of probability measures on (AxA) z has a convergent 
subsequence in the weak*-topology. 

(iii) Also, the set of all stationary couplings of X and Y is a closed subset of the set of all 
probability measures on A z x A z . 

(iv) Finally, the Boolean function \ Xo ^y } that defines the d-distance is continuous and 
bounded. 

From (i)-(iv) it follows that there exists at least a coupling which attains the ci-distance. For 
more details we refer the reader to Theorem 4.1 in Villani (2009). 

Obviously this general reasoning does not enable us to explicitly construct a coupling attain- 
ing the ci-minimum. In spite the large literature which has been concentrated to this area, as 
far as we know the problem of finding explicit solutions was addressed only for finite alphabet 
Markov chains and for finite volume Gibbs measures. To give a further step in this direction 
is exactly the goal and the novelty of this paper. We solve in a constructive way the problem 
of finding a coupling attaining the d-distance between ordered pairs of binary chains of infinite 
order. First, using basic stationarity arguments, we prove that the <i-distance is bounded below 
by |P(Vo = 1) — P(^o = 1)|- Next, we present an explicit construction of a stationary coupling 
achieving the infimum (jl.ip for stationary chains which are stochastically ordered. This con- 
struction can be effectively implemented in an algorithmic way to perfectly sample from this 
minimal d-coupling. 

This article is organized as follows. In Section [2] we introduce the notation and basic defini- 
tions. One coupling that attains the d-distance is presented in Section El The perfect sampling 
algorithm is described in Section |4] and a pseudo-code implementing it is given by Algorithm [TJ 
The proofs of the theorems are presented in Sections [5] and [6l We conclude the paper with a 
final discussion and some bibliographic remarks (see Section [7]) . 
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2 Basic definitions 

In what follows all the processes and sequences of random variables are defined on the same 
probability space (Q,B,F). 

Let X = (X n ) n£ z and Y = {Yn)n& be two stationary chains of infinite order (in the sense 
of Harris 1955) on the alphabet A = {0, 1}. Let p x and p Y respectively be the transition 
probabilities of these chains. This means that for any infinite sequence xZ\q £ -^Z^ an d an Y 
symbol oG^lwe have 

F(X = a\XzL = xZl) = p X (a\xZ 1 00 ) , 

P(Y = a\Y~i = xZl) = p Y (a\xZ 1 00 ) . 

In the above formula xZ^ denotes the sequence (xj)j<„i and AZ^o the set of all such sequences. 
These sequences will be called pasts. Given two integers m < n we will also use the notation 
x 7 ^ to denote the sequence (x m , . . . ,x n ) , and A 1 ^ to denote the set of such sequences. 

In other terms p x and p Y are regular versions of the conditional expectation of Xq and Yq 
with respect to the cr-algebra generated by XZ^ and Yzi respectively. 

Given two pasts xZ 1 ^ and yZ^, we will say that xZ^ < yZlo, if x n < y n for all n < — 1. 
This defines a partial order on AZ^ . 

Condition 1: Ordering condition We assume that the chains X and Y are stochastically 
ordered in the following sense 

P X (l\xZ 1 00 )<p Y (l\yZ 1 0D ), whenever xZL < yZ 1 , . (2.1) 

The stochastic order between p x and p Y makes it possible to construct a stationary coupling 
between X and Y in such a way that for all n € Z, X n < Y n with probability 1. This coupling 
is a stationary chain taking values in the set 

S = {(0,0), (0,1), (1,1)}. 

The transition probabilities P : S x Szlc — > [0, 1] of this chain are defined as follows: for any 
pair of ordered pasts (aC^, yZ^) £ $-00 we have 

p ((i,i)l(*i^2/^o)) = p x {^-lo)i 

P((0,0)|(x = L,yZL)) = V Y {U\yZ l 00 ) 

P((0,l)|(x = L,yZL)) = P X (0|xzL)-p y (0|yZD- (2.2) 
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We observe that for each pair of ordered pasts (aC^yZoo) £ SZ^c, P ((•, OlO^-oo; V~\o)) ls 
the optimal coupling between p x (-\xZ\ ) an d p y ( - |yZjo)- 

We want to construct a chain of infinite order on S invariant with respect to P. This can 
be done using a regenerative construction of the chain. This regenerative construction is based 
on a decomposition theorem which states that the stationary chain with infinite memory can 
be constructed by choosing at each step, in an iid way, the length of the suffix of the string of 
past symbols we need to look in order to sample the next symbol. 

The above mentioned results will follow under certain conditions on the transition probabil- 
ities: 

Condition 2: Continuity condition The transition probabilities p x and p Y on A are con- 
tinuous, that is, the continuity rates satisfy 

max {f3 x {k),p Y {k)} as k — > oo , 

where the continuity rate (3 X (k) is defined as 

/3 x (k) = maxsup{|p x (a|xlj ) - p x (a\yZ 1 )\, for all aT^, yZ^ with x_\ = y_\} , (2.3) 

and similarly for fi Y (k). 

To state our second condition we need some extra notation. For each pair (a, b) € S 
and each fixed ordered pair of pasts (aC^yZoo) £ Szlo, we define a non-decreasing sequence 
r k ((a,b)\(xZl, yZl)) such that 



r (a,b)=mi{P((a,b)\{ 



(2.4) 



and for k > 1, r k {{a, b)\(x_ k , y_ k )) is defined by 



inf{P((a,6)|(uZL,^)) : (uZ^vZl) eSzLuZl 



= x 




(2.5) 



We then define the non-decreasing sequence (a k , k G N) 




(2.6) 



(a,b)eS 



and for k > 1 




(2.7) 



(o,6)e5 
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and 



a k = inf {a k ({x_l,y_l)) : (, 



x 



(2.8) 



Condition 3: 



\ a k > . 



(2.9) 



To better understand Conditions 2 and 3 we will look at an interesting class of examples 
which are the renewal processes that forget the past every time they meet the symbol 1. Take 
p x (l\xZ 1 00 ) = 3^.-1 j and P Y (MV-lo) = ^(yz 1 ) where ^ uZl oo) = [nf i n > 1 : «-n = 1}- 

Example 1: If lim^oo qj? \ > and lim^oo gjjf \ > exist, then Condition 2 
is satisfied. On the other hand, if we take q$ = q£ k ^ q^k+i = Qi with < q§ < qf < 1, 
Condition 2 is not satisfied. 

Example 2: If lim^-K^g^ \ q^ > and lim^oo \ > 0, Condition 3 is equivalent 
to 



For instance, it is enough to have Ylmiln ~ loo) = +°° or Sn(3n ~~ Qoo) = +°° to break 
Condition 3. 



The goal of this section is to present a coupling between the chains (X) n and (Y) n that attains 
the d-distance given by |P(lb = 1) — f(Xo = 1)|. To obtain such a coupling Conditions 1-3 are 
required. Therefore, we assume from now on that they are satisfied. 

To start the construction we first decompose the transition probability P given by (j2.2[) as 
a convex combination of increasing order finite Markov kernels P^ on S x SZI for k > 1. 

Let us define a probability distribution k € N) as follows. 



[1(1 -</n +<£)(! -<Zn +<&)>0- 



it 



3 Construction of our coupling 



A = a 



(3.1) 



5 



and for k > 1 

Afe = afc - afe-i. (3.2) 

The fact that (A&, k € N) is a probability distribution follows from the fact that — > 1 as 
fc diverges. Obviously this follows from Condition 2. 

Theorem 3.3 There exists a sequence of transition probabilities P^ on S x Szl for k > 1 and 
a probability measure Pq on S such that for any pair of symbols (a, b) in S and any ordered pair 
of pasts (xZooiyZoo) € <S~lo w e have 

oo 

P {(a, bMxZl, yZl)) = A Po ((a, 6)) + £ A fc P fc ((a, 6) | (xl£ , yZ\)) • (3.4) 

fe=i 

This decomposition allows us to construct simultaneously the pair of chains (X n , Y^n^L 
taking values in S by concatenating bivariate iid strings. This is done as follows. 

Let now L = {L n , n € Z} be an iid sequence of random variables such that P(L n = k) = A& 
where (Afc, k € N) is given by (|3.ip and (|3.2p . Define also 

To = sup{z < 0; L z+m < m, for all m > 0} 

and for n > 1 

T_ n = sup{z < T_ n+ i; L z+m < m, for all m > 0} 

and 

T n = inf{z > T n _i; L z + m < m, for all m > 0}. 

Given the random variables L = {L n , n G Z} and T = {Tj,j £ Z}, we construct the bivariate 
chain {(X„, Y^n G Z} by concatenating the bivariate strings (X T ,Y T ). Each one of 
these strings is constructed as follows. 

1. Choose (Xt^Ytj) € S with probability Pq independently of the past. 

2. For any Tj < n < Tj + \ — 1 choose (X n ,Y n ) € S with probability 

Pl„ ((■, ')\(X n _ Ln = x n _ Ln ,Y n _ Ln = y n _ Ln )) ■ 

Observe that if Tj < n < Ij+i then n — L n > Tj and therefore the choice of the pair (X n , Y n ) 
is made independently of the choice of the symbols (-X"_ J 00 -.Y^^ )• I 11 this construction, the 
transition probabilities Pk are those appearing in Expression ([3. 41) . 

The existence of infinitely many finite renewal points T n is given in the next theorem. 
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Theorem 3.5 The sequence of random times T = (T n , ngZ) with . . . , T_i < To < < T\ < 
T2 < ■■■ satisfies 

(i) W-almost surely, all the random times . . . T_i < To < < T\ < T2 < . . . are finite. 

T — 1 T- — 1 

(ii) The random pairs of strings (X T ' +1 ,Y T 1+1 ), i € 7L are mutually independent and iden- 
tically distributed. 

We can now present a stationary coupling attaining the d-distance. This coupling is obtained 
concatenating the independent strings (X T l+1 , ), i G Z. For this coupling we have the 

following theorem. 

Theorem 3.6 T/ie coupling obtained by concatenating the independent strings (X T l+1 , Y T ' +1 ), i G 
Z attains the d- distance between X and Y. 

4 Perfect simulation algorithm 

Given two fixed times m < n, we want to perfectly sample (X^, Y£) according to our minimal 
d-coupling between the chains X and Y described in Section [3l 

There is an obvious difficulty: we cannot identify a regeneration point experimentally. This 
follows from the fact that, for any j € Z the event "j is a regeneration point" is measurable 
with respect to the <r-algebra generated by the random variables Lj + k, k > 0. 

This difficult will be overcome by Algorithm Q] whose pseudo-code is given below. Algorithm 
Q] will produce a sequence (X^,Y^) as follows. We sequentially choose iid random variables 
L s , s = n, n — 1, . . . , with distribution F(L S = k) = given by and (|3.2p . The algorithm 

checks every time t < m, until it finds the first one which has the property that 

L s < s — t , for all s = t, . . . , n . 

Call T[m,n] the first t <m which has this property: 

T[m, n] = sup{t < m; L s < s — t, for all s = t, . . . , n}. 

The random time T[m, n] indicates how far back into the past we have to look in order to 
construct (X!^,Y^). 
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In other terms, if T[m, n] = t then we can choose (Xt,Yf) independently of the past with dis- 
tribution Pq. Moreover, the next pair (Xt+i, Yt+i) can be chosen using distribution Pi(-\(Xt,Yt)) 
and recursively we can choose all the sequence (X",l^ n ) without knowledge of the symbols oc- 
curring before time T[m,n]. The kernels Pq and Pk are defined as in Theorem 13.31 

The sequence (X^ l ,Y^) produced by Algorithm 1 in a finite number of steps depends on 
the particular choice of the random variables Lj,j = T[m, n], . . . ,n. Let us call this choice 
lj,j = T[m, n], . . . , n. On the other hand, the sequence (X^,Y^) produced by the theoretical 
construction presented in Section [3] depends on the choice of Lj,j £ Z. Let us call lj,j G Z 
this choice. The important point to stress is that if L = lj,j = T[m, n], . . . , n then (X^, Y£) = 
(X^jY^). This is the content of the following theorem. 

We will prove the following theorem. 

Theorem 4.1 Under Conditions 1-3, for the decomposition given by for every pair of 

integers m < n, we have: 

(a) T[m,n] is a.s. finite. 

(b) The event {T[m, n] = t} is measurable with respect to the a -algebra generated by the random 
variables L s ,t < s < n. 

(c) Algorithm^ stops almost surely after a finite number of steps. 

(d) The sequence (X^,Y^) produced by Algorithm 1 is a perfect sample of the minimal d- 
coupling between the chains X and Y described in Section 



5 Proofs of Theorems 3.3 and 3.5 



Proof of Theorem [331 

Before starting the proof let us sketch its main ideas. Given an ordered pair of "past" strings 
(xZ^q, yZ^c), we want to randomly choose a new random pair of symbols (a, b) £ S according 
to P ("KxZoo, yZ^o))- This random choice can be performed as follows. First make a partition 
{I((a, b)\(xZlo, yZlo)), (a,b) € S} of the interval [0, 1] where the length of I((a,b)\(xZl ,yZl ) 
is equal to P((a, 6)|(a;I 00 , yZ^))- Then, choose a random element £ uniformly distributed in 
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Algorithm 1 Perfect simulation for a minimal ci-coupling 
Require: Two integers m < n. 

Ensure: The bivariate string (X£,Y£) and the past time T[m,n}. 

1: B <— {£? is the set of time positions s for which the pair (X s , Y s ) has already been chosen} 
2: t <- m 
3: S <r- t 

4: while s < n do 



5: if s B then 

6: choose L s with distribution F(L S = k) = \ independently of everything 

7: if L s > s — t then 

8: ti-t-l 

9: S<-t 
10: end if 

11: else 

12: choose (X s , Y s ) with distribution P Ls ((•, •) | (X a a Zl t , Y s s zlJ) 

13: B^BU{s} 

14: S «- S + 1 

15: end if 



16: end while 

17: T[m,n} <- t 

18: return (X™ , F"), T[m, n] 
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[0,1]. If £ 6 b)\(x_ 00 , y_oo), then choose (a, 6) as the new pair of symbols. It turns out 
that I((a, b)\(xZ^o, yZlo) can be decomposed as the following disjoint union 

/(M)|(x:L,ylL) = h{{aM)uu k >M{aM{x~- k ,yZl)), (5.1) 

where the length of Io((a, b)) and Ik((ct, b)\{xZ\, VZ\) are suitably chosen. Loosely speaking, the 
length of the interval ifc((a, b)\{xZ\iUZ\) is the smallest probability to choose (a, 6) for any pair 
of ordered pasts having {xZ\,yZ\) as ending sequence. 

We can consider a second different partition of [0, 1] by using the increasing sequence < 
chq < oi\ < • • •• The length of the fcth element of this partition is precisely Loosely speaking, 
if £ falls on this interval, then we only need to look at the last h symbols of the past. 

Formally this is done as follows. Let us define a partition of the interval [0, 1] formed by the 
disjoint intervals 

/o((0,0)),/o((0,l)),/ ((l,l)),/i((0,0)|(x„ 1 ,^ 1 )), 

and for k > 1, 

i fc ((o,o)|(x:i,y:£)),i fc ^ 

disposed in the above order in such a way that the left extreme of one interval coincides with 
the right extreme of the precedent. These intervals have length 

|J ((o,6))|=r ((o,6)) (5.2) 

and for k > 1, 

\h((a,b)\(xZl,yZl))\=r k ((a,b)\(xZl,yll» (5.3) 
Notice that the continuity of transition probabilities p x and p Y implies that 

r k ((a,b)\(xZl,yZl)) -> PdaMi^yZl)) (5.4) 

as k diverges. 

By construction, 

P((am^l,yZl)) = \I ((a,b))\+^M(a^ZlyZl))\. (5.5) 
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Therefore, we can simulate P((a,,b)\(x_ 00 ,y_ 00 )) by using an auxiliary random variable £ 
uniformly distributed on [0, 1] as 

P((a,b)\(x-lc,y-lo)) = P(eG/o((a,6))UU fc >iJ fc ((a,6)|(x:i,yli))). (5.6) 
Observe that the right hand side of this equality can be rewritten as 

£>(£e [a fc _i,a fc ))P K€/ (M))U \J Ik((a,b)\(xZl,Vll))\S € hfc-i,a fc ) J (5.7) 

fc>0 \ fc>l / 

where a_i = 0. 

By construction, 

[0,a fc )n(J Ii((a,6)|(x:i,y4)) = 0. 

In other terms, for each fe, the conditional probabilities on the right hand side of (|5.7p depend 
on the suffix (aCfcjJ/Zfc) and not on the remaining terms (x_^ +1 \ 2/„^ +1 ^)- Moreover, 

^ ^[^I ((a,b))u\Jl k ((a,b)\(xZl,yZl))\^[a k -i,a k ) \ =1. 

(a,6)G5 \ fe>l / 

Therefore, we are entitled to define the order k Markov probability transitions P k as 
P k ((a,b)\(xZl,yZl)) = P ^€/ ((a,6))U \Jlk((a,b)\(xZlvll))\t€ [<**-!,«*) j . (5.8) 

Finally we define the probability distribution (A^, £ N) as follows. 

A = P(£€ [0,a )) = a (5.9) 

and for h > 1 

A fc = P(£ € [a fc _i, a fc )) = a fc - a fc _i. (5.10) 
This concludes the proof. □ 

Proof of Theorem 13.51 

Define the event B n as "n is a regeneration point". Formally, 

B n = f| {Ln+m < m}. (5.11) 

m>0 
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Observe that 

n u Bn n n u r - ) = n ^ < + °°} n n > • ( 5 - 12 ) 

^N>ln>N J \N<0n<N J k>l k<0 

Therefore, the existence of infinitely many regeneration times T n will follow from the following 
lemma. 

Lemma 5.13 Assume that a = Hj=o a j > 0- Then, for any N € Z, 

P( (J B n ) = 1. 



Proof. For any n € Z define 



F n ° = {L n > 0} 



and m > 1 



Define 



m— 1 

i=o 



and for k > 2 

+oo +oo 

^= U ■■■ U (^" 1 n...nF--r nfe - 2 ~ 1 ni?„ fc _ 1 ) . 

ni=7V+l n fc _i=n fc +l 

How to interpret .Fjy? Assume = and therefore, we can choose (-X^v, Yn) independently 
of the past symbols (X_ ^ 1 , Y^J' 1 ). From this point on, we look at the values of L^+j and we 
can choose (Xjv+jj Xzv+j) using only the knowledge of Y^ +J_1 ). This sequence breaks 

down at j = m, since L^+m > Tn and therefore, the choice of (Xjv+mj Yiv+m) depends on the 
knowledge of symbols occurring before time N. 

Therefore, is the event in which the trials, described above, starting from time N 
fail exactly k — 1 times before finally we find the starting point of a string which is entirely 
independent of the past symbols. Therefore, the events , k = 1, 2, . . . are disjoint and 



{jB n ={jD». 



n=N k=l 
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Therefore 

(+00 \ +00 
(J B n \ = £>(i#). 
n=N J k=l 

Due to the fact that the random lengths {L n ,n € Z} are identically distributed, the proba- 
bilities computed above do not depend on the specific choice of N. By definition 

+00 +00 

ppf) = £ ... £ P«- Ar - 1 n...n<;r M " 1 nv 1 ). 

m=iV+l nfe-i=n fe _ 2 +l 

Using the independence of F^ f 1 ~ N ~ 1 , . . . , i^-i"* -1 1 and _B nfe whenever < m < . . . < we 
can rewrite the right hand side of the last expression as 

+00 +00 

F{D?)= £ ••• £ P(^-^..l(C?" r >(^)' 

ni=Af+l n fc =n fc _i+l 

Since L n , n 6 Z are iid random variables with P(Lo < m) = a m , for any n, we have 

P(S„) = F(n m > {L n+m < m}) 
= \oi m = a 

m>0 

and 

+00 

l=n+l 

Therefore, for any k > 1 we have 

Ppf ) = a(l - a)' 2 " 1 

and 

(+OO \ + OO 

(J S n =^a(l-a) fc - 1 = l. 
n=N / k=l 

This concludes the proof of the lemma. □ 
Lemma 15.131 and the stationarity of the events B n imply that 

P ( U B n J = 0- 

\n=— 00 / 

Observe that for each n, if B n occurs, then (X%° ,Y£°) can be chosen independently from 
from the past symbols (X™^ 1 , Y™^). This concludes the proof of Theorem 13.51 □ 
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6 Proof of Theorems 13.61 and 14.1 



We begin with a lemma giving a lower bound for the d-distance between stationary binary 
chains. For this lemma we are not assuming that the chains are ordered. 

Lemma 6.1 Let X = (X n ) ne g and Y = (Y n ) n ^z be any two stationary chains on {0, 1}. Then 

J(X,Y)>|P(Y = 1)-P(X = 1)|- 

Proof. The set of all stationary chains (X' n ,Y^) n& z taking values on {0, l} 2 such that F(X' n = 
1) = F(X n = 1) and P(Y^ = 1) = F(Y n = 1) contains the set of all stationary couplings between 
(X n ) n and (Y n ) n . Therefore, 

inf |p(Xo 7^ Yq) : (X, Y) stationary coupling of X and Y j 

is greater than 

inf |p(Z + Yo) for all (X ,Y ) such that X = X and Y = Y } • 

It is a straightforward computation to check that this last term reaches its minimum with the 
following optimal coupling between Xq and Yq. For any a E {0, 1}, take 

n(X'oX) = {a, a)) = mm{P(X = a),P(Y = a)}, 
F((X' , y ') = (a, 1 - a)) = F(X = a) - F((X' X) = (a, a)) . 

□ 

Now we show that for ordered binary stationary chains |P(Yo = 1) — P(^o = 1)1 is also an 
upper bound for d(X, Y). 

Consider the coupling obtained by concatenating the independent strings as described in 
Sectional Theorems 13.31 and 13.51 imply that the process (X n ,Y n ) ne z taking values in S is 
stationary. As a consequence 

• the chains (X n ) n£ z and (Y n ) n€ z constructed simultaneously by the algorithm are also 
stationary, 

• (Xo,Yo) is a coupling of the probabilities P(^o = ■) and P(1q = ')> 

• moreover by construction Xq < Yq. 
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There exists a unique optimal coupling between ¥(Xq = •) and P(Yb = satisfying the order 
condition Xq < Yq : 

p{(x ,y ) = (o,o)} = p(y = o), 

P{(X ,F ) = (l,l)} = P(^o = l), 
P{(X , Yq) = (0, 1)} = P(X = 0) - p(y = o) . 
With this coupling we have 

P{A + Y } = P(y = 1) - HX = 1). (6.2) 
Equality (|6.2p together with Lemma 16.11 concludes the proof of Theorem 13.61 □ 

To prove Theorem 14.11 let us assume without loss of generality that m = 0. 
Assertion (a) follows from the fact that for any n > 0, T[0, n] > To and by Theorem 13. 5| To 
is finite almost surely. 

The proof of (b) follows from the definition of T[0, n]. 

We want to prove that the number of steps Algorithm [T] makes before stopping is finite. 
Observe that for each t between T[0, n] and 0, the algorithm must do at most C(\t\ + n) steps 

• to check if L s < s — t for any t < s < n 

• and to assign a value to X s if this is possible. 

In the expression C(\t\ + n), C is a fixed positive constant which bounds above the number of 
operations we need to perform at each single step. 

Therefore the total number of steps Algorithm [1] must do before it stops is bounded above 

by 

C. (k + n) = C 



-r[0,n] + l).n + - r '°'""- r ' -"' + 1 » 



k=0 

This concludes the proof of (c) . 

Finally, to prove (d) let us suppose that for t < we have 

L t = 0, L t+1 < 1, ...,L n <n-t. (6.3) 

Then, the choice of (X",Y t n ), according to the theoretical construction of Section [3l is 
independent of L s , s < t. 
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By definition, T[0, n] = sup{i < 0; L t = 0, Lt+i < 1, • • • , L n < n — t}. By (a) T[0,n] is 
almost surely finite. By construction, if T[0, n] = t then 

(X?,Y t n ) = (X?,Y t n ). 

□ 



7 Final comments and reference remarks 

The main contribution of this article is to present an explicit construction of a stationary cou- 
pling between ordered binary chains of infinite order achieving the minimal d-distance. More- 
over, we show that this explicit construction is feasible, in the sense that it can be realized by 
a perfect simulation algorithm which stops almost surely after a finite number of steps. 

Theorem l3.6l can be seen as a generalization to the infinite volume setting of results of Kirillov 
et al. (1989) who show that the classical coupling introduced by Holley (1974) attains d-distance 
for finite volume Gibbs states. Besides Kirillov et al. (1989) the only other constructive results 
on this field are Ellis (1976, 1978, 1980a, 1980b) which consider the case of Markov chains on 
a finite alphabet. Ours seems to be the first constructive solution for chains of infinite order. 
Several challenges lay ahead. For instance the problem of finding a constructive solution for 
non-binary chains and/or non-ordered pairs of chains as well as infinite volume Gibbs measures. 

Our results can be presented as a constructive solution for the Monge-Kantorovich problem 
with additive cost function on C : A z x A z — > [0, 1] defined as follows. For any pair of sequences 
xl^ and y±~ 

C(^ — oo 3 D—oo) = ^ ] c n \x n — y n \ , 

where (c n ) ne z is a sequence of positive real numbers, with X^ngz c « = 1- This follows straight- 
forward from the following observation. 

= inf < CrJ?{X n 7^ Y n ) : (X, Y) stationary coupling of X and Y > 

= inf < P(Xo 7^ Yq) c n ■ (X, Y) stationary coupling of X and Y > 
I nez J 
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= inf |P(Xo 7^ Y$) : (X, Y) stationary coupling of X and y| 
= d(X,Y). 

The Monge-Kantorovich problem has attracted lots of attention recently. However, to the 
best of our knowledge, ours are the first results in this direction. The literature on MKP is 
very extensive. We let the interested reader to find his way starting with the classical reference 
Rachev (1984) up to the last Villani (2009). 

Chains of infinite order seem to have been first studied by Onicescu and Mihoc (1935a) who 
called them chains with complete connections (chaines a liaisons completes). The name chains 
of infinite order was coined by Harris (1955). We refer the reader to Iosifescu and Grigorescu 
(1990) for a presentation of the classical material. We refer the reader to Fernandez, Ferrari 
and Galves (2001) for a self contained presentation of chains of infinite order including the 
representation of chains of infinite order as a countable mixture of finite order Markov chains. 

Our Theorem 13.51 is an application to pairs of chains of the results in Comets, Fernandez 
and Ferrari (2002). However, our proof of the result is new and we believe it is simpler than 
theirs. The representation of chains of infinite order as a countable mixture of Markov chains 
of increasing order appears explicitly in Kalikow (1990) and implicitly in Ferrari et al. (2000) 
and Comets et al. (2002). Regeneration schemes for chains of infinite order have been obtained 
by Berbee (1987) and by Lalley (1986, 2000). 

In the literature, the stochastically order between stochastic chains we considered here is 
also called domination. We refer the reader to the book of Lindvall (1992) for more on the 
subject. 

The question of uniqueness of the coupling attaining the d-distance, even for ordered pairs, 
is open. However, it is easy to see that in the iid binary case there is only one stationary 
coupling which achieves the minimal (i-distance. The following example was suggested by one 
of the referees. Assume that (X n ) n , (Y n ) n are iid Bernoulli processes with P(X n = 1) = q and 
¥(Y n = 1) = p with p > q. Then the only distribution on ({0, 1} x {0, 1}) Z which is a coupling 
between (X n ) n and (Y n ) n which achieves the d-distance, in this case P(Xq ^ Yq) = p — q, is the 
stationary product measure given by 

¥((X n , Y n ) = (1, 1)) = q, P((X n , Y n ) = (0, 0)) = 1 - p, 

F((X n , Y n ) = (0, 1)) = 1 - p, P((X n , Y n ) = (0, 0)) = 0. 
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To assure that Algorithm 1 stops after a finite number of steps we need weaker conditions 
than our Conditions 1 and 3. This follows from the fact that our Algorithm 1 is inspired by the 
one proposed in Comets et al. (2002) in a different context. For details, we refer the reader to 
the original article. However, our goal was to sample from a minimal d-coupling. It is an open 
issue if this can be done under weaker conditions. 
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